The spread of epidemic disease on networks 
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The study of social networks, and in particular the spread of disease on networks, has attracted 
considerable recent attention in the physics community. In this paper, we show that a large class 
of standard epidemiological models, the so-called susceptible/infective/removed (SIR) models can 
be solved exactly on a wide variety of networks. In addition to the standard but unrealistic case of 
fixed infectiveness time and fixed and uncorrelated probability of transmission between all pairs of 
individuals, we solve cases in which times and probabilities are non-uniform and correlated. We also 
consider one simple case of an epidemic in a structured population, that of a sexually transmitted 
disease in a population divided into men and women. We confirm the correctness of our exact 
solutions with numerical simulations of SIR epidemics on networks. 
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I. INTRODUCTION 

Many diseases spread through human populations by 
contact between infective individuals (those carrying the 
disease) and susceptible individuals (those who do not 
have the disease yet, but can catch it). The pattern of 
these disease-causing contacts forms a network. In this 
paper we investigate the effect of network topology on 
the rate and pattern of disease spread. 

Most mathematical studies of disease propagation 
make the assumption that populations are "fully mixed," 
meaning that an infective individual is equally likely to 
spread the disease to any other member of the population 
or subpopulation to which they belong ||, ||. In the 
limit of large population size this assumption allows one 
to write down nonlinear differential equations governing, 
for example, numbers of infective individuals as a func- 
tion of time, from which solutions for quantities of inter- 
est can be derived, such as typical sizes of outbreaks and 
whether or not epidemics occur. (Epidemics are defined 
as outbreaks that affect a non-zero fraction of the popula- 
tion in the limit of large system size.) Epidemic behavior 
usually shows a phase transition with the parameters of 
the model — a sudden transition from a regime without 
epidemics to one with. This transition happens as the 
"reproductive ratio" Rq of the disease, which is the frac- 
tional increase per unit time in the number of infective 
individuals, passes though one. 

Within the class of fully mixed models much elabo- 
ration is possible, particularly concerning the effects of 
age structure in the population, and population turnover. 
The crucial element however that all such models lack is 
network topology. It is obvious that a given infective in- 
dividual does not have equal probability of infecting all 
others; in the real world each individual only has contact 
with a small fraction of the total population, although 
the number of contacts that people have can vary greatly 
from one person to another. The fully mixed approxima- 
tion is made primarily in order to allow the modeler to 
write down differential equations. For most diseases it is 
not an accurate representation of real contact patterns. 

In recent years a large body of research, particularly 



within the statistical physics community, has addressed 
the topological properties of networks of various kinds, 
from both theoretical and empirical points of view, and 
studied the effects of topology on processes taking place 
on those networks ||J Social networks 0, [|, 
technological networks [flO|, |TT| , |l2| , and biological 
networks (l4[ [H| [ll], [L7|> |18fl have all been examined 
and modeled in some detail. Building on insights gained 
from this work, a number of authors have pursued a 
mathematical theory of the spread of disease on net- 
works gj| |o[ p, || n |§. This is als ° the to P ic 

of the present paper, in which we show that a large class 
of standard epidemiological models can be solved exactly 
on networks using ideas drawn from percolation theory. 

The outline of the paper is as follows. In Section II 
we introduce the models studied. In Section III we show 



how percolation ideas and generating function methods 
can be used to provide exact solutions of these models on 
simple networks with uncorrelated transmission probabil- 
ities. In Section IV we extend these solutions to cases in 



which probabilities of transmission are correlated, and in 
Section [v] to networks representing some types of struc- 
tured populations. In Section VI we give our conclusions. 



II. EPIDEMIC MODELS AND PERCOLATION 

The mostly widely studied class of epidemic models, 
and the one on which we focus in this paper, is the class of 
susceptible/infective/removed or SIR models. The orig- 
inal and simplest SIR model, first formulated (though 
never published) by Lowell Reed and Wade Hampton 
Frost in the 1920s, is as follows. A population of TV 
individuals is divided into three states: susceptible (S), 
infective (I) , and removed (R) . In this context "removed" 
means individuals who are either recovered from the dis- 
ease and immune to further infection, or dead. (Some 
researchers consider the R to stand for "recovered" or 
"refractory." Either way, the meaning is the same.) In- 
fective individuals have contacts with randomly chosen 
individuals of all states at an average rate (3 per unit 
time, and recover and acquire immunity (or die) at an 
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average rate 7 per unit time. If those with whom infec- 
tive individuals have contact are themselves in the sus- 
ceptible state, then they become infected. In the limit of 
large N this model is governed by the coupled nonlinear 
differential equations pj: 



ds a- di a- 



dr 

7Z ' 



7«! 



(1) 



where s(t), i(t), and r(t) are the fractions of the pop- 
ulation in each of the three states, and the last equa- 
tion is redundant, since necessarily s + i + r = 1 at all 
times. This model is appropriate for a rapidly spreading 
disease that confers immunity on its survivors, such as 
influenza. In this article we will consider only diseases of 
this type. Diseases that are endemic because they prop- 
agate on timescales comparable to or slower than the 
rate of turnover of the population, or because they con- 
fer only temporary immunity, are not well represented by 
this model; other models have been developed for these 
cases j|. 

The model described above assumes that the popula- 
tion is fully mixed, meaning that the individuals with 
whom a susceptible individual has contact are chosen at 
random from the whole population. It also assumes that 
all individuals have approximately the same number of 
contacts in the same time, and that all contacts transmit 
the disease with the same probability. In real life none 
of these assumptions is correct, and they are all grossly 
inaccurate in at least some cases. In the work presented 
here we remove these assumptions by a series of modifi- 
cations of the model. 

First, as many others have done, we replace the "fully 
mixed" aspect with a network of connections between in- 
dividuals liT 
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Individ- 



_ Jl[ [22} £3] £4|, |2§ |2£ 

uals have disease-causing contacts only along the links in 
this network. We distinguish here between "connections" 
and actual contacts. Connections between pairs of indi- 
viduals predispose those individuals to disease-causing 
contact, but do not guarantee it. An individual's con- 
nections are the set of people with whom the individual 
may have contact during the time he or she is infective — 
people that the individual lives with, works with, sits 
next to on the bus, and so forth. 

We can vary the number of connections each person has 
with others by choosing a particular degree distribution 
for the network. (Recall that the degree of a vertex in 
a network is the number of other vertices to which it is 
attached.) For example, in the case of sexual contacts, 
which can transmit STDs, the degree distribution has 
been found to follow a power-law form j^] . By placing the 
model on a network with a power-law degree distribution 
we can emulate this effect in our model. 

Our second modification of the model is to allow the 
probability of disease-causing contact between pairs of 
individuals who have a connection to vary, so that some 
pairs have higher probability of disease transmission than 
others. 

Consider a pair of individuals who are connected, one 



of whom i is infective and the other j susceptible. Sup- 
pose that the average rate of disease-causing contacts be- 
tween them is ry, and that the infective individual re- 
mains infective for a time r^. Then the probability 1 — TJy 
that the disease will not be transmitted from i to j is 



1 — Ti,\ = lim (1 — r;jSt) 



T i/ St — c - r a T i 



and the probability of transmission is 



(2) 



(3) 



Some models, particularly computer simulations, use dis- 
crete time-steps rather than continuous time, in which 
case instead of taking the limit in Eq. (||) we simply set 
5t = 1, giving 



T; 



l-a-r^rs 



(4) 



where r is measured in time-steps. 

In general r^ and n will vary between individuals, so 
that the probability of transmission also varies. Let us 
assume initially that these two quantities are iid ran- 
dom variables chosen from some appropriate distribu- 
tions P(r) and P(t). (We will relax this assumption 
later.) The rate need not be symmetric — the prob- 
ability of transmission in either direction might not be 
the same. In any case, Ty is in general not symmetric 
because of the appearance of Tj in Eqs. (||) and (|I|). 

Now here's the trick: because ry and Tj are iid random 
variables, so is Ty , and hence the a priori probability of 
transmission of the disease between two individuals is 
simply the average T of Ty over the distributions P(r) 
and P(t), which is 

T = {Tij) = 1 - / dr dr P(r)P(r) e" rr (5) 
Jo 

for the continuous time case or 



T = 1 



dr 



T = 



P(r)P(T)(l -r) T 



(6) 



for the discrete case |2^| . We call T the "transmissibility" 
of the disease. It is necessarily always in the range < 
T < 1. 

Thus the fact that individual transmission probabil- 
ities vary makes no difference whatsoever; in the pop- 
ulation as a whole the disease will propagate as if all 
transmission probabilities were equal to T. We demon- 
strate t he tru th of this result by explicit simulation in 
Section III E . It is this result that makes our models 
solvable. Cases in which the variables r and t are not 
iid are trickier, but, as we will show, these are sometimes 
solvable as well. 

We note further that more complex disease transmis- 
sion models, such as SEIR models in which there is an 
infected-but-not-infective period (E) , are also covered by 
this formalism. The transmissibility is essentially just 
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the integrated probability of transmission of the disease 
between two individuals. The precise temporal behavior 
of infectivity and other variables is unimportant. Indeed 
the model can be generalized to include any temporal 
variation in infectivity of the infective individuals, and 
transmission can still be represented correctly by a sim- 
ple transmissibility variable T, as above. 

Now imagine watching an outbreak of the disease, 
which starts with a single infective individual, spreading 
across our network. If we were to mark or "occupy" each 
edge in the graph across which the disease is transmit- 
ted, which happens with probability T, the ultimate size 
of the outbreak would be precisely the size of the clus- 
ter of vertices that can be reached from the initial vertex 
by traversing only occupied edges. Thus, the model is 
precisely equivalent to a bond percolation model with 
bond occupation probability T on the graph represent- 
ing the community. The connection between the spread 
of disease and percolation was in fact one of the origi- 
nal motivations for the percolation model itself |29| ], but 
seems to have been formulated in the manner presented 
here first by Grassberger pQjfor the case of uniform r 
and r, and by Warren et al. |23[ [24[ | for the non-uniform 
case. 

In the next section we show how the percolation prob- 
lem can be solved on random graphs with arbitrary de- 
gree distributions, giving exact solutions for the typical 
size of outbreaks, presence of an epidemic, size of the epi- 
demic (if there is one) , and a number of other quantities 
of interest. 



III. EXACT SOLUTIONS ON NETWORKS 
WITH ARBITRARY DEGREE DISTRIBUTIONS 

One of the most important results to come out of 
empirical work on networks is the finding that the de- 
gree distributions of many networks are highly right- 
skewed. In other words, most vertices have only a low 
degree, but there are a small number whose degree is 
very high [|| |?], [TTJ, [H| . The network of sexual contacts 
discussed above provides one example of such a distri- 
bution H . It is known that the presence of highly con- 
nected vertices can have a disproportionate effect on cer- 
tain properties of the network. Recent work suggests 
that the same may be true for disease propagation on 
networks |2l], |3^] , and so it will be important that we in- 
corporate non-trivial degree distributions in our models. 
As a first illustration of our method therefore, we look at 
a simple class of unipartite graphs studied previously by a 

variety of authors || |J, H M E01 H 11 EH H3> II > in 

which the degree distribution is specified, but the graph 
is in other respects random. 

Our graphs are simply defined. One specifies the de- 
gree distribution by giving the properly normalized prob- 
abilities pk that a randomly chosen vertex has degree k. 
A set of N degrees {h}, also called a degree sequence, is 
then drawn from this distribution and each of the N ver- 



tices in the graph is given the appropriate number fcj of 
"stubs" — ends of edges emerging from it. Pairs of these 
stubs are then chosen at random and connected together 
to form complete edges. Pairing of stubs continues until 
none are left. (If an odd number of stubs is by chance 
generated, complete pairing is not possible, in which case 
we discard one ki and draw another until an even number 
is achieved.) This technique guarantees that the graph 
generated is chosen uniformly at random from the set of 
all graphs with the selected degree sequence. 

All the results given in this section are averaged over 
the ensemble of possible graphs generated in this way, in 
the limit of large graph size. 



A. Generating functions 

We wish then to solve for the average behavior of 
graphs of this type under bond percolation with bond oc- 
cupation probability T. We will do this using generating 
function techniques p3[ . Following Newman et al. [ |36| , 
we define a generating function for the degree distribu- 
tion thus: 



Go[x) = ^Pk r - 



(7) 



k=0 



Note that Go(l) = Sfc-Pfe = 1 if Pfc is a properly normal- 
ized probability distribution. 

This function encapsulates all of the information about 
the degree distribution. Given it, we can easily recon- 
struct the distribution by repeated differentiation: 



Pk 



1 d k G 



kl dx k 



(8) 



a:=0 



We say that the generating function Go "generates" the 
distribution p^. The generating function is easier to work 
with than the degree distribution itself because of two 
crucial properties: 

Powers If the distribution of a property k of an ob- 
ject is generated by a given generating function, then the 
distribution of the sum of k over m independent realiza- 
tions of the object is generated by the m th power of that 
generating function. For example, if we choose m ver- 
tices at random from a large graph, then the distribution 
of the sum of the degrees of those vertices is generated 
by [G (x)] m . 

Moments The mean of the probability distribution 
generated by a generating function is given by the first 
derivative of the generating function, evaluated at 1. For 
instance, the mean degree z of a vertex in our network is 
given by 



<fc) = £> fc = G{,(1). 



(9) 
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Higher moments of the distribution can be calculated 
from higher derivatives also. In general, we have 



dx 



G Q (x) 



(10) 



A further observation that will also prove crucial is the 
following. While Go above correctly generates the dis- 
tribution of degrees of randomly chosen vertices in our 
graph, a different generating function is needed for the 
distribution of the degrees of vertices reached by follow- 
ing a randomly chosen edge. If we follow an edge to the 
vertex at one of its ends, then that vertex is more likely 
to be of high degree than is a randomly chosen vertex, 
since high-degree vertices have more edges attached to 
them than low-degree ones. The distribution of degrees 
of the vertices reached by following edges is proportional 
to kpk, and hence the generating function for those de- 
grees is 



J2 k k PkX k 



G' (x) 
'G' (1Y 



(11) 



In general we will be concerned with the number of ways 
of leaving such a vertex excluding the edge we arrived 
along, which is the degree minus 1. To allow for this, we 
simply divide the function above by one power of x, thus 
arriving at a new generating function 



G^x) 



G'Jx 



1 



G' 



1} =-G' (x), 



(12) 



where z is the average vertex degree, as before. 

In order to solve the percolation problem, we will also 
need generating functions Gq(x]T) and G\{x;T) for the 
distribution of the number of occupied edges attached to 
a vertex, as a function of the transmissibility T. These 
are simple to derive. The probability of a vertex having 
exactly m of the k edges emerging from it occupied is 
given by the binomial distribution (^)T m (l - T) k -'"\ 
and hence the probability distribution of m is generated 
by 



G (x;T) = J2 £ 



m— k—rn 
oo k 



Pk\ \T m (l-T) k - m x r ' 
m . 



k=0 m=0 



E»EL (^r(i-T) 



\k — rn 



J2Pk(l-T + xT) k 



k=0 



G (l + (z-l)T). 



(13) 



Similarly, the probability distribution of occupied edges 
leaving a vertex arrived at by following a randomly cho- 
sen edge is generated by 



G 1 (x;T)=G l (l + (x-l)T). 



(14) 



Note that, in our notation 

G (x;l) = G (x), 

G (1;T) = G (l), 

G (1;T) = TG' (1), 



(15a) 
(15b) 
(15c) 



and similarly for G\. (G' Q (x;T) here represents the 
derivative of Gq(x; T) with respect to its first argument.) 



B. Outbreak size distribution 

The first quantity we will work out is the distribution 
P 8 {T) of the sizes s of outbreaks of the disease on our 
network, which is also the distribution of sizes of clusters 
of vertices connected together by occupied edges in the 
corresponding percolation model. Let Hq(x;T) be the 
generating function for this distribution: 



H (x;T)=Y / Ps(T) 



(16) 



By analogy with the previous section we also define 
Hi(x;T) to be the generating function for the cluster 
of connected vertices we reach by following a randomly 
chosen edge. 

Now, following Ref. |3(| we observe that Hi can be bro- 
ken down into an additive set of contributions as follows. 
The cluster reached by following an edge may be: 

1. a single vertex with no occupied edges attached to 
it, other than the one along which we passed in 
order to reach it; 

2. a single vertex attached to any number m > 1 of 
occupied edges other than the one we reached it by, 
each leading to another cluster whose size distribu- 
tion is also generated by H\. 

We further note that the chance that any two finite clus- 
ters that are attached to the same vertex will have an 
edge connecting them together directly goes as TV -1 with 
the size N of the graph, and hence is zero in the limit 
TV — > oo. In other words, there are no loops in our clus- 
ters; their structure is entirely tree-like. 

Using these results, we can express H\(x;T) in a 
Dyson-equation-like self-consistent form thus: 



H 1 {x;T)=xG l {H 1 {x;T);T), 



(17) 



Then the size of the cluster reachable from a randomly 
chosen starting vertex is distributed according to 



H {x;T)=xG {H x {x;T);T), 



(18) 



It is straightforward to verify that for the special case 
T = 1 of 100% transmissibility, these equations reduce 
to those given in Ref. ^ for component size in ran- 
dom graphs with arbitrary degree distributions. Equa- 
tions ( p"7| ) and ([l8]) provide the solution for the more 



5 



general case of finite transmissibility which applies to 
SIR models. Once we have H (x;T), we can extract the 
probability distribution of clusters P s (T) by differentia- 
tion using Eq. (|J) on Ho. In most cases however it is 
not possible to find arbitrary derivatives of Hq in closed 
form. Instead we typically evaluate them numerically. 
Since direct evaluation of numerical derivatives is prone 
to machine precision problems, we recommend evaluat- 
ing the derivatives by numerical contour integration using 
the Cauchy formula: 



, v 1 d s H 

Ps(T) = -r 



si dx s 



1 
2tti 



C 



s+l 



dC, (19) 



where the integral is over the unit circle p4( |. It is pos- 
sible to find the first thousand derivatives of a function 
without difficulty using this method |36| . By this method 
then, we can find the exact probability P s that a partic- 
ular outbreak of our disease will infect s people in total, 
function of the transmissibility T. 



C. Outbreak sizes and the epidemic transition 

Although in general we must use numerical methods to 
find the complete distribution P s of outbreak sizes from 
Eq. (|l9|), we can find the mean outbreak size in closed 
form. Using Eq. (||), we have 

(s) = H' (1;T) = 1 + G' (1;T)H[(1;T), (20) 



where we have made use of the fact that the generating 
functions are 1 at x = 1 if the distributions that they gen- 
erate are properly normalized. Differentiating Eq. (17), 
we have 



H' 1 (l;T) = l + G' 1 (l;T)H' 1 (l;T) = 



1 



l-G[(l;T)' 



and hence 



= 1 



= l 



TG' a {\) 
l-TGi(l)' 



(21) 



(22) 



Given Eqs. (@), @, ©, and @, we can then evaluate 
this expression to get the mean outbreak size for any 
value of T and degree distribution. 

We note that Eq. (§|) diverges when TG[(1) = 1. This 
point marks the onset of an epidemic; it is the point at 
which the typical outbreak ceases to be confined to a 
finite number of individuals, and expands to fill an ex- 
tensive fraction of the graph. The transition takes place 
when T is equal to the critical transmissibility T c , given 
by 



T r = 



1 



G' (l) 



GUI) E**(*~1)P* 



(23) 



For T > T c , we have an epidemic, or "giant compo- 
nent" in the language of percolation. We can calculate 



the size of this epidemic as follows. Above the epidemic 
threshold Eq. (|17|) is no longer valid because the giant 
component is extensive and therefore can contain loops, 
which destroys the assumptions on which Eq. (|l7j ) was 
based. The equation is valid however if we redefine Hq to 
be the generating function only for outbreaks other than 
epidemic outbreaks, i.e., isolated clusters of vertices that 
are not connected to the giant component. These how- 
ever do not fill the entire graph, but only the portion of it 
not affected by the epidemic. Thus, above the epidemic 
transition, we have 



Ho(l;T) = 5^P a = l-S(T), 



(24) 



where S(T) is the fraction of the population affected by 
the epidemic. Rearranging Eq. (|2J) for S and making 
use of Eq. (Eq), we find that the size of the epidemic is 



S(T) = l-G (u;T), 



(25) 



where u = Hi(l; T) is the solution of the self-consistency 
relation 



u = Gi(u; T). 



(26) 



were given previ- 



Results equivalent to Eqs. (|22|) to 
ously in a different context in Ref. Phi. 

Note that it is not the case, even above T c , that all 
outbreaks give rise to epidemics of the disease. There 
are still finite outbreaks even in the epidemic regime. 
While this appears very natural, it stands nonetheless in 
contrast to the standard fully mixed models, for which 
all outbreaks give rise to epidemics above the epidemic 
transition point. In the present case, the probability of 
an outbreak becoming an epidemic at a given T is simply 
equal to S(T). 



D. Degree of infected individuals 

The quantity u defined in Eq. (|2FJ) has a simple inter- 
pretation: it is the probability that the vertex at the end 
of a randomly chosen edge remains uninfected during an 
epidemic (i.e., that it belongs to one of the finite compo- 
nents). The probability that a vertex becomes infected 
via one of its edges is thus v = 1 — T + Tu, which is the 
sum of the probability 1 — T that the edge is unoccupied, 
and the probability Tu that it is occupied but connects 
to an uninfected vertex. The total probability of being 
uninfected if a vertex has degree k is v k , and the proba- 
bility of having degree k given that a vertex is uninfected 
is PkV k I J2kPk vk ~ Pk yk /Gq(v), which distribution is 
generated by the function Go{vx)/Gq{v). Differentiating 
and setting x = 1, we then find that the average degree 
£out of vertices outside the giant component is 



^out 



vG' (v) _ vGi(v) _ u[l-T + Tu] 

~gM ~~gM z ~ T~s 



(27) 
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Similarly the degree distribution for an infected vertex is 
generated by [G (x) — G (vx)}/[1 — G (v)], which gives 
a mean degree z- ln for vertices in the giant component of 



_ l-vGi(v) 
Zin ~ l-G Q (v) Z 



l — u[l—T + Tu] 
S 



(28) 



Note that 1 — S = Gq(u;T) < u, since all coefficients 
of Gq(x; T) are by definition positive (because they form 
a probability distribution) and hence Gq(x;T) has only 
positive derivatives, meaning that it is convex everywhere 
on the positive real line within its domain of convergence. 
Thus, from Eq. (f27j), z ou t < z. Similarly, z; n > z, and 
hence, as we would expect, the mean degree of infected 
individuals is always greater than or equal to the mean 
degree of uninfected ones. Indeed, the probability of a 
vertex being infected, given that it has degree k, goes as 
1 — v k = I — e _felog ( 1 /' u ) , i.e., tends exponentially to unity 
as degree becomes large. 



E. An example 

Let us now look at an application of these results to a 
specific example of disease spreading. First of all we need 
to define our network of connections between individuals, 
which means choosing a degree distribution. Here we will 
consider graphs with the degree distribution 



Pk 





Ck~ 



-k/K 



for k = 
for k > 1. 



(29) 



where C, a, and k are constants. In other words, the 
distribution is a power-law of exponent a with an ex- 
ponential cutoff around degree k. This distribution has 
been studied before by various authors |@, |t], flO) . It 
makes a good example for a number of reasons: 

1. distributions of this form are seen in a variety of 
real- world networks (?|, 45 ; 



2. it includes pure power-law and pure exponential 
distributions, both of which are also seen in various 
networks (7[ Ell [l2[ |3lj , as special cases when k — > 
oo or a — > 0; 

3. it is normalizable and has all moments finite for any 
finite k; 

The constant C is fixed by the requirement of normal- 
ization, which gives C = [Liaic- 1 ^)}- 1 and hence 



Pk 



fc- a e~ fe / K 
Liafc- 1 /") 



for k > 1, 



(30) 



where Li„(x) is the nth polylogarithm of x. 

We also need to choose the distributions P(r) and P(t) 
for the transmission rate and the time spent in the in- 
fective state. For the sake of easier comparison with 
computer simulations we use discrete time and choose 



both distributions to be uniform, with r real in the range 
< r < r max and r integer in the range 1 < r < r max . 
The transmissibility T is then given by Eq. (0). From 
Eq. @), we have 



and 



G (x) 



Gi(x) 



Li a (xe" 1 / K ) 
= Li Q (e-V«) ' 

Lia-ifse- 1 /") 



(31) 



aLia-ite-V*)' 
Thus the epidemic transition in this model occurs at 
Li^e" 1 ^) 



(32) 



Lia-2(e-V'«)-Li a _ 1 (e-V'«) 



(33) 



Below this value of T there are only small (non-epidemic) 
outbreaks, which have mean size 



(a) = 1 



T[Li Q _ 1 (e- 1 /")]2 



Li a (e-!/«)[(T + 1) Li a _ 1 (e- 1 /«) - TLi a _ 2 ( e - 1 /«)] ' 

(34) 

Above it, we are in the region in which epidemics can oc- 
cur, and they affect a fraction S of the population in the 
limit of large graph size. We cannot solve for S in closed 
form, but we can solve Eqs. (EE) and (|26| ) by numerical 
iteration and hence find S. 

In Fig. ^ we show the results of calculations of the 
average outbreak size and the size of epidemics from the 
exact formulas, compared with explicit simulations of the 
SIR model on networks with the degree distribution |3(]) . 
Simulations were performed on graphs of TV = 100 000 
vertices, with a = 2, a typical value for networks seen in 
the real world, and k = 5, 10, and 20 (the three curves in 
each panel of the figure) . For each pair of the parameters 
a and n for the network, we simulated 10 000 disease 
outbreaks each for (r, r) pairs with r max from 0.1 to 1.0 
in steps of 0.1, and r max from 1 to 10 in steps of 1. Fig. |l| 
shows all of these results on one plot as a function of the 
transmissibility T, calculated from Eq. (^|). 

The figure shows two important things. First, the 



points corresponding to different values of ; 



and 



but the same value of T fall in the same place and the 
two-parameter set of results for r and t collapses onto a 
single curve. This indicates that the arguments leading 
to Eqs. (jE)) an d (| ) are correct (as also demonstrated by 
Warren et al. |23| , 24j ) and that the statistical properties 
of the disease outbreaks really do depend only on the 
transmissibility T, and not on the individual rates and 
times of infection. Second, the data clearly agree well 
with our analytic results for average outbreak size and 
epidemic size, confirming the correctness of our exact so- 
lution. The small disagreement between simulations and 
exact solution for (s) close to the epidemic transition in 
the lower panel of the figure appears to be a finite size 
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0.2 0.4 0.6 
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FIG. 1: Epidemic size (top) and average outbreak size (bot- 
tom) for the SIR model on networks with degree distributions 
of the form (^) as a function of transmissibility. Solid lines 
are the exact solutions, Eqs. ( p5| ) and (^), for a = 2 and (left 
to right in each panel) k = 20, 10, and 5. Each of the points is 
an average result for 10 000 simulations on graphs of 100 000 
vertices each with distributions of r and r as described in the 
text. 



effect, due to the relatively small system sizes used in the 
simulations. 

To emphasize the difference between our results and 
those for the equivalent fully mixed model, we compare 
the position of the epidemic threshold in the two cases. 
In the case a = 2, n = 10 (the middle curve in each 
frame of Fig. |l|), our analytic solution predicts that the 
epidemic threshold occurs at T c = 0.329. The simula- 
tions agree well with this prediction, giving T c — 0.32(2). 
By contrast, a fully mixed SIR model in which each infec- 
tive individual transmits the disease to the same average 
number of others as in our network, gives a very different 
prediction of T c = 0.558. 



IV. CORRELATED TRANSMISSION 
PROBABILITIES 

It is possible to imagine many cases in which the prob- 
abilities of transmission of a disease from an infective 
individual to those with whom he or she has connections 
are not iid random variables. In other words, the proba- 
bilities of transmission from a given individual to others 
could be drawn from different distributions for different 
individuals. This allows, for example, for cases in which 
the probabilities tend either all to be high or all to be 
low but are rarely a mixture of the two. In this section, 
we show how the model of Section III can be generalized 
to allow for this. 

Suppose that the transmission rates r for transmission 



from an infective individual i to each of the fcj others with 
whom they have connections are drawn from a distribu- 
tion Pi (r) , which can vary from one individual to another 
in any way we like. Thus the a priori probability of trans- 
mission from i to any one of his or her neighbors in the 
network is 



Ti = 1 



Jo 



dr dTPi(r)P(T) e" 



(35) 



One could of course also allow the distribution from 
which the time r is drawn to vary from one individual 
to another, although this doesn't result in any functional 
change in the theory, so it would be rather pointless. In 
any case, the formalism developed here can handle this 
type of dependency perfectly well. 

Following Eq. (jl3|), we note that in the percolation rep- 
resentation of our model the distribution of the number 
of occupied edges leading from a particular vertex is now 
generated by the function 



<*>(*; Pi}) = ^EEQW-t, 

i=0 m=0 v ' 
1 N 

= TfE( 1 +( a: - 1 ) T ^- ( 36 ) 



ki—m^m 



i=0 



And similarly, the probability distribution of occupied 
edges leaving a vertex arrived at by following a randomly 
chosen edge is generated by 



Gi(x; {T,}) = 



E^(! + (x-l)T t ) 



fei-i 



(37) 



Clearly these reduce to Eqs. ( |l3| ) and ( |l4| ) when Ti is 
independent of i. 

With these definitions of the basic generating func- 
tions, our derivations proceed as before. The complete 
distribution of the sizes of outbreaks of the disease, ex- 
cluding epidemic outbreaks if there are any, is generated 
by 



where 



H (x; {Ti}) = xGoiH^x; {Ti}); {Ti}), (38) 



H^x; {Ti}) = xdiH^x; {Ti}); {Ti}). (39) 



The average outbreak size when there is no epidemic 
is given by Eq. ( p2f ) as before, and the size of epi- 
demics above the epidemic transition is given by Eqs. ( p5| ) 
and (|2rj). The transition itself occurs when G'^il; {Ti}) = 
1 and, substituting for G\ from Eq. (|37]), we can also 
write this in the form 



E **[(*» = 0. 



(40) 



i=0 



In fact, it is straightforward to convince oneself that when 
the sum on the left-hand side of this equation is greater 



8 



than zero epidemics occur, and when it is less than zero 
they do not. 

For example, consider the special case in which the 
distribution of transmission rates P(r) depends on the 
degree of the vertex representing the infective individual. 
One could imagine, for example, that individuals with a 
large number of connections to others tend to have lower 
transmission rates than those with only a small number. 
In this case Tj is a function only of fcj and hence we have 



G 



1 N 

o(x;{T k }) = -J2(l + (x-l)T ki ) k 



i=0 



£> fe (l + (a;-l)T fe ) fe , (41) 



k=0 



and 



Gi(x;{T k }) 



(42) 



where T k is the mean transmissibility for vertices of de- 
gree k. 

One can also treat the case in which the transmissi- 
bility is a function of the number of connections which 
the individual being infected has. If the probability of 
transmission to an individual with degree k is U k , then 
we define 



G (x;{U k }) = J2P^ k 



(43) 



As a concrete example of the developments of this sec- 
tion, consider the physically plausible case in which the 
transmissibility T depends inversely on the degree of the 
infective individual: T k — T\Jk. Then from Eq. (pfcj ) we 
find that there is epidemic behavior only if 



1' 



(47) 



regardless of the degree distribution. Since T lies strictly 
between zero and one however, this is impossible. In 
networks of this type, we therefore conclude that diseases 
cannot spread. Only if transmissibilities fall off slower 
than inversely with degree in at least some part of their 
range are epidemics possible. One plausible way in which 
this might happen is if T k ~ (To + k) . In this case it is 
straightforward to show that epidemics are possible for 
some degree distributions for some values of To. 

Other extensions of the model are possible too. One 
area of current interest is models incorporating vaccina- 
tion jy^, Q. Disease propagation on networks incor- 
porating vaccinated individuals can be represented as a 
joint site/bond percolation process, which can also be 
solved exactly , both in the case of uniform indepen- 
dent vaccination probability (i.e., random vaccination of 
a population) and in the case of vaccination that is corre- 
lated with properties of individuals such as their degree 
(so that vaccination can be directed at the so-called core 
group of the disease-carrying network — those with the 
highest degrees). 



Gi(z;{£/ fe }) 



E fc fc Pfc [l + (z fc - 1 -l)l4 
Efc k Pk 



(44) 



and then the calculation of cluster size distribution and 
so forth proceeds as before. 

Further, one can solve the case in which probability of 
transmission of the disease depends on both the proba- 
bilities of giving it and catching it, which are arbitrary 
functions T k and U k of the numbers of connections of the 
infective and susceptible individuals. (This means that 
transmission from a vertex with degree j to a vertex with 
degree k occurs with a probability equal to the product 
TjU k .) The appropriate generating functions for this case 
are 

G (x;{T k },{U k }) = ^> fe (l + (x-l)T fe ) fe , (45) 



Gi(x;{T fc },{t^}) = 

E fc k Pk [l + ((1 + (x - l)^.)*- 1 - l)U k ) 



(46) 



and indeed Eqs. (|4|) to ( f44| ) can be viewed as special 
cases of these equations when either T k = 1 or U k = 1 
for all k. Note that G (x; {U k }) and G (x; {T k } , {U k }) 
are both independent of {U k }, since the probability of a 
randomly-chosen infective individual having the disease 
is unity, regardless of the probability that they caught it 
in the first place. 



V. STRUCTURED POPULATIONS 

The models we have studied so far have made use of 
simple unipartite graphs as the substrate for the spread 
of disease. These graphs may have any distribution we 
choose of the degrees of their vertices, but in all other re- 
spects are completely random. Many of the really inter- 
esting cases of disease spreading take place on networks 
that have more structure than this. Cases that have been 
studied previously include disease spreading among chil- 
dren who attend a common school and among patients 
in different wards of a hospital between whom pathogens 
are communicated by peripatetic caregivers [ |47j ]. Here, 
we give just one example of disease spreading in a pop- 
ulation with a very simple structure. The example we 
consider is the spread of a sexually transmitted disease. 
The important structural element of the population in 
this case is its division into men and women. 



A. Bipartite populations 

Consider then a population of M men and N women, 
who have distributions pj , q k of their numbers j and k of 
possibly disease-causing contacts with the opposite sex 
(connections in our nomenclature). In a recent study of 
2810 respondents Liljeros et al. B recorded the numbers 
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FIG. 2: Distributions of the numbers of sexual contacts of 
men and women in the study of Liljeros et al. |B. The his- 
togram is cumulative, meaning that the vertical axis indicates 
the fraction of individuals studied who have greater than or 
equal to the number of contacts specified on the horizontal 
axis. Both distributions approximately follow power laws — 
straight lines on the logarithmic axes used here. Inset: the 
bipartite form of the modeled network of contacts. 



of sexual partners of men and women over the course of 
a year and found the distributions pj , qu shown in Fig. |^. 
As the figure shows, the distributions appear to take a 
power-law form pj ~ j Qm , q k ~ k af , with exponents a m 
and af that fall in the range 3.1 to 3.3 for both men 
and women j52]. (The exponent for women seems to be 
a little higher than that for men, but the difference is 
smaller than the statistical error on the measurement.) 

We will assume that the disease of interest is transmit- 
ted primarily by contacts between men and women (true 
only for some diseases in some communities p8|), so that, 
to a good approximation, the network of contacts is bi- 
partite, as shown in the inset of Fig. g. That is, there 
are two types of vertices representing men and women, 
and edges representing connections run only between ver- 
tices of unlike kinds. With each edge we associate two 
transmission rates, one of which represents the probabil- 
ity of disease transmission from male to female, and the 
other from female to male. These rates are drawn from 
appropriate distributions as before, as are the times for 
which men and women remain infective. Also as before, 
however, it is only the average integrated probability of 
transmission in each direction that matters for our perco- 
lation model, so that we have two transmissibilities T m f 
and Tf m for the two directions |5j| . 

We define two pairs of generating functions for the de- 



gree distributions of males and females: 

fo(x) = J^P^' M x ) = -/oW. ( 48a ) 

3 M 

go(x) = ^2q k x k , g 1 (x) = -g' Q {x), (48b) 



where /i and v are the averages of the two degree distri- 
butions, and are related by 



pL v 

M = TV' 



(49) 



since the total numbers of edges ending at male and fe- 
male vertices are necessarily the same. Using these func- 
tions we further define, as before 

fo(x;T) = f (l + (x-l)T), (50a) 

fi(x;T) = fi(l + (x-l)T), (50b) 

9o(x; T) = 00(1 + (x - 1)T), (50c) 

gi(x;T) = g x (l + (x - 1)T). (50d) 

Now consider an outbreak that starts at a single indi- 
vidual, who for the moment we take to be male. From 
that male the disease will spread to some number of fe- 
males, and from them to some other number of males, 
so that after those two steps a number of new males will 
have contracted the disease, whose distribution is gener- 
ated by 

F (x;T mf ,T fm ) = fo(gi(x;T fm );T m f). (51) 

For a disease arriving at a male vertex along a randomly 
chosen edge we similarly have 

Fi(x;T mf ,T fm ) = f 1 {g x {x;Tf m );T m f). (52) 

And one can define the corresponding generating func- 
tions Go and G\ for the vertices representing the females. 

Using these generating functions, we can now calcu- 
late generating functions Ho and Hi for the sizes of out- 
breaks of the disease in terms either of number of women 
or of number of men affected. The calculation proceeds 
exactly as in the unipartite case, and the resulting equa- 
tions for Hq and Hi are identical to Eqs. ( |l7| ) and (|Tg|). 
We can also calculate the average outbreak size and the 
size of an epidemic outbreak, if one is possible, from 
Eqs. (|22|), (|25|), and (p6|). The average outbreak size for 
males, for example, is 



= 1 



Fp{l', T m f, Tfm) 

1 - F{(l;T fm ,T mf ) 
T mf T fm f (l)g[(l) 
l-T mf T fm f[(l)g[(l) 



The epidemic transition takes place 
F{(l;T m f ,Tf m ) — 1, or equivalently when 

TmfTfmfiil)^!) = 1, 



(53) 
when 

(54) 



10 



a 



0.4 



0.0 



2.5 



3.0 3.5 
exponent a 



4.0 



FIG. 3: The critical transmissibility T c for the model of 
a sexually transmitted disease discussed in the text. T c is 
greater than zero and less than one only in the small range 
3 < a < 3.4788 of the exponent a. 



and hence the epidemic threshold takes the form of a 
hyperbola in T m f-Tf m space: 



E,-i(7-i)Pi£**(*-i)p* 



(55) 

Note that this expression is symmetric in the variables 
describing the properties of males and females. Although 
we derived it by considering the generating function for 
males F\, we get the same threshold if we consider G\ 
instead. Eq. ( p3| ) is not symmetric in this way, so that 
the typical numbers of males and females affected by an 
outbreak may be different. On the other hand Eq. ( pa ) 
involves the transmissibilities T m f and Tf m only in the 
form of their product, and hence the quantities of interest 
are a function only of a single variable T m fTf m . 

The generalizations of Section IV , where we considered 
transmission probabilities that vary from one vertex to 
another, are possible also for the bipartite graph consid- 
ered here. The derivations are straightforward and we 
leave them as an exercise for the reader. 



B. Discussion 

One important result that follows immediately from 
Eq. ( |55| ) is that if the degree distributions are truly 
power-law in form, then there exists an epidemic tran- 
sition only for a small range of values of the exponent 
of the power law. Let us assume, as appears to be the 
case, that the exponents are roughly equal for men and 
women: a m — aj — a. Then Eq. ( p5| ) tells us that the 
epidemic falls on the hyperbola T m fTf m 



T c 2 , where 



T r = 



C(a-l) 



C(a-2)-C(a-l)' 



(56) 



where £(x) is the Riemann ^-function. The behavior of 
T c as a function of a is depicted in Fig. ^. As the figure 
shows, if a < 3, T c = and hence T m fTf m = 0, which is 
only possible if at least one of the transmissibilities T m f 
and Tf m is zero. As long as both are positive, we will al- 
ways be in the epidemic regime, and this would clearly be 
bad news. No amount of precautionary measures to re- 
duce the probability of transmission would ever eradicate 
the disease. (Lloyd and May Q have pointed out that a 
related result appears in the theory of fully mixed mod- 
els, where a heterogeneous distribution of the infection 
parameter (5 (see Eq. ([j])) with a divergent coefficient of 
variation will result in the absence of an epidemic thresh- 
old. Pastor-Satorras and Vespignani [|l] have made sim- 
ilar predictions using mean-field-like solutions for SIRS- 
type endemic disease models on networks with power-law 
degree distributions and a similar result has also been 
reported for percolation models by Cohen et al. @.) 
Conversely, if a > a c , where a c = 3.4788 ... is the so- 
lution of ((a - 2) = 2((a - 1), we find that T c = 1 
and hence T m fTf m = 1, which is only possible if both 
T m j and Tf m are 1. When either is less than 1 no epi- 
demic will ever occur, which would be good news. Only 
in the small intermediate region 3 < a < 3.4788 does the 
model possess an epidemic transition. Interestingly, the 
real-world network measured by Liljeros et al. j^] appears 
to fall precisely in this region, with a ~ 3.2. If true, this 
would be both good and bad news. On the bad side, it 
means that epidemics can occur. But on the good side, it 
means that it is in theory possible to prevent an epidemic 
by reducing the probability of transmission, which is pre- 
cisely what most health education campaigns attempt to 
do. The predicted critical value of the transmissibility 
is T c — 0.363 . . . for a — 3.2. Epidemic behavior would 
cease were it possible to arrange for the transmissibility 
to fall below this value. 

Some caveats are in order here. The error bars on the 
values of the exponent a are quite large (about ±0.3 Q). 
Thus, assuming that the conclusion of a power-law de- 
gree distribution is correct in the first place, it is still 
possible that a < 3, putting us in the regime where there 
is always epidemic behavior regardless of the value of the 
transmissibility. (The error bars are also large enough to 
put us in the regime a > a c in which there are no epi- 
demics. Empirical evidence suggests that the real world 
is not in this regime however, since epidemics plainly do 
occur.) 

It is also quite possible that the distribution is not a 
perfect power law. Although the measured distributions 
do appear to have power-law tails, it seems likely that 
these tails are cut off at some point. If this is the case, 
then there will always be an epidemic transition at fi- 
nite T, regardless of the value of a. Furthermore, if it 
were possible to reduce the number of partners that the 
most active members of the network have, so that the 
cutoff moves lower, then the epidemic threshold rises, 
making it easier to eradicate the disease. Interestingly, 
the fraction of individuals in the network whose degree 
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need change in order to make a significant difference is 
quite small. At a = 3, for instance, a change in the 
value k of the cutoff from k — oo to k = 100 affects 
only 1.3% of the population, but increases the epidemic 
threshold from T c = to T c = 0.52. In other words, 
targeting preventive efforts at changing the behavior of 
the most active members of the network may be a much 
better way of limiting the spread of disease than target- 
ing everyone. (This suggestion is certainly not new, but 
our models provide a quantitative basis for assessing its 
efficacy.) 

Another application of the techniques presented here is 
described in Ref. |4£j. In that paper we model in detail the 
spread of walking pneumonia (Mycoplasma pneumoniae) 
in a closed setting (a hospital) for which network data 
are available from observation of an actual outbreak. In 
this example, our exact solutions agree well both with 
simulations and with data from the outbreak studied. 
Furthermore, examination of the analytic solution allows 
us to make specific suggestions about possible new con- 
trol strategies for M. pneumoniae infections in settings 
of this type. 

VI. CONCLUSIONS 

In this paper, we have shown that a large class of the 
so-called SIR models of epidemic disease can be solved ex- 



actly on networks of various kinds using a combination of 
mapping to percolation models and generating function 
methods. We have given solutions for simple unipartite 
graphs with arbitrary degree distributions and hetero- 
geneous and possibly correlated infectiveness times and 
transmission probabilities. We have also given one ex- 
ample of a solution on a structured network — the spread 
of a sexually transmitted disease on a bipartite graph of 
men and women. Our methods provide analytic expres- 
sions for the sizes of both epidemic and non-epidemic 
outbreaks and for the position of the epidemic threshold, 
as well as network measures such as the mean degree of 
individuals affected in an epidemic. 

Applications of the techniques described here are pos- 
sible for networks specific to many settings, and hold 
promise for the better understanding of the role that net- 
work structure plays in the spread of disease. 
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One should observe that the network studied in Ref. |^ is 
a cumulative network of actual sexual contacts — it rep- 
resents the sum of all contacts over a specified period of 
time. Although this is similar to other networks of sexual 
contacts studied previously Ji(], |n| , it is not the network 
required by our models, which is the instantaneous net- 
work of connections (not contacts — see Section II). While 
the network measured may be a reasonable proxy for the 
network we need, it is not known if this is the case. 
It is also worth noting that networks of sexual contacts 
observed in sociometric studies |si| are often highly den- 
dritic, with few short loops, indicating that the tree-like 
components of our percolating clusters may be, at least 
in this respect, quite a good approximation to the shape 
of real STD outbreaks. 



